{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Working with Parquet Files\n",
    "\n",
    "The easiest way to read a GeoParquet or Parquet file is to use `sd.read_parquet()`. Alternatively, you can query these files directly by their path in SQL."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install SedonaDB\n",
    "\n",
    "Use pip to install SedonaDB from the Python Package Index (PyPI)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note**: Before running this notebook on your local machine, you must have SedonaDB installed in your environment. You can install SedonaDB with the following command: `pip install \"apache-sedona[db]\"`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Implementation\n",
    "\n",
    "A common workflow for working with GeoParquet and/or Parquet files is:\n",
    "\n",
    "1. **Load** the Parquet file into a data frame using `sd.read_parquet()`.\n",
    "2. **Register** the data frame as a view with `to_view()`.\n",
    "3. **Query** the view using `sd.sql()`.\n",
    "4. **Write** your results to a Parquet file with `.to_parquet()` or use `.to_pandas()` to export your results to a DataFrame or GeoDataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the sedona.db module and connect to SedonaDB\n",
    "import sedona.db\n",
    "\n",
    "sd = sedona.db.connect()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "┌──────────────┬───────────────────────────────┐\n",
      "│     name     ┆            geometry           │\n",
      "│     utf8     ┆            geometry           │\n",
      "╞══════════════╪═══════════════════════════════╡\n",
      "│ Vatican City ┆ POINT(12.4533865 41.9032822)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ San Marino   ┆ POINT(12.4417702 43.9360958)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Vaduz        ┆ POINT(9.5166695 47.1337238)   │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Lobamba      ┆ POINT(31.1999971 -26.4666675) │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Luxembourg   ┆ POINT(6.1300028 49.6116604)   │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Palikir      ┆ POINT(158.1499743 6.9166437)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Majuro       ┆ POINT(171.3800002 7.1030043)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Funafuti     ┆ POINT(179.2166471 -8.516652)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Melekeok     ┆ POINT(134.6265485 7.4873962)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Bir Lehlou   ┆ POINT(-9.6525222 26.1191667)  │\n",
      "└──────────────┴───────────────────────────────┘\n"
     ]
    }
   ],
   "source": [
    "# 1. Load the Parquet file\n",
    "df = sd.read_parquet(\n",
    "    \"https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/\"\n",
    "    \"natural-earth/files/natural-earth_cities_geo.parquet\"\n",
    ")\n",
    "\n",
    "# 2. Register the data frame as a view\n",
    "df.to_view(\"zone\")\n",
    "\n",
    "# 3. Query the view and store the result in a new DataFrame\n",
    "query_result_df = sd.sql(\"SELECT * FROM zone LIMIT 10\")\n",
    "query_result_df.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Verifying the written file at 'query_results.parquet'...\n",
      "┌──────────────┬───────────────────────────────┐\n",
      "│     name     ┆            geometry           │\n",
      "│     utf8     ┆            geometry           │\n",
      "╞══════════════╪═══════════════════════════════╡\n",
      "│ Vatican City ┆ POINT(12.4533865 41.9032822)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ San Marino   ┆ POINT(12.4417702 43.9360958)  │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Vaduz        ┆ POINT(9.5166695 47.1337238)   │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Lobamba      ┆ POINT(31.1999971 -26.4666675) │\n",
      "├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
      "│ Luxembourg   ┆ POINT(6.1300028 49.6116604)   │\n",
      "└──────────────┴───────────────────────────────┘\n"
     ]
    }
   ],
   "source": [
    "# 4. Write the result to a new Parquet file\n",
    "output_path = \"query_results.parquet\"\n",
    "query_result_df.to_parquet(output_path)\n",
    "\n",
    "# (Optional) Verify the written file\n",
    "print(f\"\\nVerifying the written file at '{output_path}'...\")\n",
    "verified_df = sd.read_parquet(output_path)\n",
    "verified_df.show(5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv (3.13.3)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
